Diversity versus Quality in Classification Ensembles Based on Feature Selection
نویسندگان
چکیده
Feature subset-selection has emerged as a useful technique for creating diversity in ensembles – particularly in classification ensembles. In this paper we argue that this diversity needs to be monitored in the creation of the ensemble. We propose an entropy measure of the outputs of the ensemble members as a useful measure of the ensemble diversity. Further, we show that using the associated conditional entropy as a loss function (error measure) works well and the entropy in the ensemble predicts well the reduction in error due to the ensemble. These measures are evaluated on a medical prediction problem and are shown to predict the performance of the ensemble well. We also show that the entropy measure of diversity has the added advantage that it seems to model the change in diversity with the size of the ensemble.
منابع مشابه
A New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملOverfitting and Diversity in Classification Ensembles based on Feature Selection
This paper addresses Wrapper-like approaches to feature subset selection and the production of classifier ensembles based on members with different feature subsets. The paper starts with the observation that if an insufficient amount of data is used to guide the Wrapper search then the feature selection will overfit the data. If the objective of the feature selection exercise is to build a bett...
متن کاملEnsembles of Instance Selection Methods based on Feature Subset
In this paper the application of ensembles of instance selection algorithms to improve the quality of dataset size reduction is evaluated. In order to ensure diversity of sub models, selection of a feature subsets was considered. In the experiments the Condensed Nearest Neighbor (CNN) and Edited Nearest Neighbor (ENN) algorithms were evaluated as basic instance selection methods. The results sh...
متن کاملA novel method based on a combination of deep learning algorithm and fuzzy intelligent functions in order to classification of power quality disturbances in power systems
Automatic classification of power quality disturbances is the foundation to deal with power quality problem. From the traditional point of view, the identification process of power quality disturbances should be divided into three independent stages: signal analysis, feature selection and classification. However, there are some inherent defects in signal analysis and the procedure of manual fe...
متن کاملDiversity in Ensemble Feature Selection
Ensembles of learnt models constitute one of the main current directions in machine learning and data mining. Ensembles allow us to achieve higher accuracy, which is often not achievable with single models. It was shown theoretically and experimentally that in order for an ensemble to be effective, it should consist of high-accuracy base classifiers that should have high diversity in their pred...
متن کامل